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Detailed protocol 


Here we upload datasets used to generate the main figures of a paper, "Machine learning enables prediction of metabolic system evolution in 
bacteria" (1.0.1126/sciadv.adc9130). The zip file contains two files: a phylogenetic tree and a gene presence/absence information for every extant and 


ancestral species. 


e /Bio-protocol-data/Bacterial_tree.nwk : A Newick format file of the bacterial reference phylogeny extracted from GTDB reference phylogeny (r89). 

e /Bio-protocol-data/KEGGOG_Species_PresenceAbsence list : A TSV file of the presence/absence profile of every ortholog group (OG) for every tip 
node (extant species) and every internal node (ancestors) of */Bio-protocol-data/Bacterial_tree.nwk’. This file has one row for each pair of an OG and an 
internal/tip node. Every row's first, second, and third columns indicate the OG name (KEGG ortholog group), node name, and the presence/absence 
state, respectively. The presence/absence state is represented as 1 (present) or 0.5 (uncertain; for ancestors). If there is no row for a pair of anOG and a 
node in the phylogeny, the OG is absent in the genome corresponding to the node. 


The files above were directly used as input files of Evodictor in the study (10.1126/sciadv.adc9130). 
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